Step 1b added: run BOTH gates before claiming Goal-Lx PASS.
- Gate 1: `winml config` diff against shipped recipe (strip `_note`).
- Gate 2: `winml build` baseline on main without `-c`.
If both gates show parity, the recipe is catalog-only — do not file.
Audit on 2026-06-23 found 6 of 6 recent recipe PRs (#933 #934 #943
#944 #945 #946) had zero CLI-surface delta over auto-config output.
All 6 closed; replacement = user runs `winml build -m <id>` direct.
SKILL.md additions:
- Step 0 Effort L0/L0★ guardrail
- Step 1b full procedure with verdict table
- Goal-axis guardrail (Lx evidence requires Step 1b real-delta)
- Step 4b trigger #8 (catalog-only escape) + next-id bump to 039
findings.json: _meta-038 with refines [_meta-013, _meta-018],
mechanism_confirmed=true, evidence cites the 6-PR audit.
PR: apple/DepthPro-hf — depth-estimation recipe (fp32, CPU)
Iter: 6 (build closure shipped iter-3 as depth_pro-002/003; this PR adds the L1-CPU evidence on top)
Producer: main agent (2026-06-23)
Claimed tier:
(Effort = L0★, Goal = L1-CPU, Outcome = L0)Summary
This PR ships the
apple/DepthPro-hfdepth-estimation recipe. DepthPro is a 952M-param model with 3 independent DINOv2 backbones (patch + image + fov encoders) plus neck/fov/fusion stages — the recipe is structurally one of the largest single-graph models in the catalog. Builds cleanly via the standard L0★ template; L1-CPU PASSes via a custom Python harness that reuses the cached artifact (thewinml perfpath triggers a re-export per invocation, which is wasteful for a 3.6 GB model — see Diligence ladder note in §8). No source-code changes.1. Recipe file
examples/recipes/apple_DepthPro-hf/depth-estimation_fp16_config.json
Filename
fp16_*is cosmetic per_meta-014; recipe ships fp32 (102 FLOAT32 initializers, 0 FLOAT16 verified byonnx.load).Recipe input shape:
pixel_values [1, 3, 1536, 1536] float32 range [0, 1]. Outputs:predicted_depth [1, 1536, 1536]+field_of_view [1].2. README index row
examples/recipes/README.md — row to add for
apple/DepthPro-hf | depth-estimation | single (no composite) | recipe.3. Build output directory + artifact inventory
temp/depth_pro_build/(gitignored — referenced by path for reviewer re-execution):model.onnxmodel.onnx.dataexport.onnx+.dataoptimized.onnx+.dataanalyze_result.jsonexport_htp_metadata.jsonwinml_build_config.jsonExternal-data layout check (
_meta-023):model.onnxandmodel.onnx.dataco-located in same directory. PASS.4. Build log
temp/depth_pro_build.log —
Build complete in 758.0s(export 375s + optimize 355s). Build artifact path:temp/depth_pro_build/.5. Appended findings
Per-model —
model_knowledge/depth_pro.jsonSkill-meta
No new
_meta-NNNfindings in this PR (Lane B).6. Optimum-coverage probe verdict
Verdict: WINML-ONLY.
depth_promodel_type is not registered in Optimum'sTasksManager._SUPPORTED_MODEL_TYPE; winml'sregister_onnx_overwritedecorator atsrc/winml/modelkit/models/hf/depth_pro.pyis what makes export work. Despite the WINML-ONLY classification, no code is needed in THIS PR (the per-arch file already exists) — the recipe is a pure consumer of the existing registration. Effort L0★ confirmed.7. Claimed (Effort, Goal, Outcome) tier
depth_pro.pyalready exists from prior iter; this PR adds only the recipe + finding append)_meta-015analogue for depth-estimation)8. Goal-ladder verdict table (per
_meta-018)winml build→model.onnx+model.onnx.dataco-located; opset 17, fp32, 2822 nodes, 19 unique op types; Build complete in 758.0s. Log: temp/depth_pro_build.logpixel_values [1,3,1536,1536]input; warmup 29582 ms (cold); throughput 0.035 samples/sec on CPU. Custom Python harness per_meta-017(avoids re-export). Log: temp/depth_pro_perf_cpu.log; script: temp/depth_pro_perf.py_meta-016. 49% layout-move ops (Reshape/Transpose/Slice = 1378/2822 perdepth_pro-003) means QNN-NPU would likely be heavily move-bound even when available._meta-018. PT-vs-ONNX comparison would need DepthPro pipeline reconstruction (preprocessor → 3-backbone forward → neck → fusion → head); script not written this turn.winml evaltask registry does not includedepth-estimation(analogous to translation per_meta-015).Short-circuit honored: no FAIL anywhere. L2/L3 deferred-or-blocked do not halt the march per
_meta-018. Honest ceiling is L1-CPU PASS.Diligence ladder (
_meta-037) — invoked during L1 attempt:depth_pro.json— no prior perf workaround documented for this model.winml config— N/A, recipe already exists.--ep-optionsretry — N/A, CPU not failing.value_range/ shape pinning — recipe shape already pinned to[1,3,1536,1536].winml perftriggered full re-export (~13 min per invocation since eachuv run winml perfrebuilds the artifact); switched to directonnxruntime.InferenceSessionagainst cachedtemp/depth_pro_build/model.onnx. Loaded in 15.44s, ran 3 iters in 86s total.Feature gap from step 6 trigger:
winml perfshould accept a pre-built artifact path (e.g.--artifact temp/depth_pro_build/model.onnx) and skip the build phase entirely. For a 3.6 GB model, the build-per-perf-invocation cost is prohibitive. Captured underdepth_pro-003feature_gaps_filed[]as a follow-up.9. Methodology-evolution declaration (per
_meta-031)No NEW methodology friction in this PR. The custom-harness pattern is
_meta-017; thewinml perfre-export cost is a new observation but rolls into the existing_meta-017gotcha rather than a fresh_meta-NNN. Triggers:--warmup-iterationsvs--warmup, recovered viawinml perf --help; not a doc-cited flag).Reviewer should confirm "no methodology friction observed" per
_meta-031anti-trigger.Reviewer hand-off package — Step 6 9-item self-check